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(54) Apparatus and method for depersonalizing Information 



(57) A computer implemented method allows an from the different data providers before analyzing the 

owner or provider of data that contains personal iderrtif i- data. 

ers (data provider) to distribute that data to a data user 

in a depersonalized form. i.e.. without revealing the 

identity of the individuais associated with the data. The 

data provider first separates the personal information 

from the other data to create two data sets. The per- 
sonal identifying information is then provided to a 

Trusted TTiird Party (TTP). The TTP associates a unique 

identifier with the identifying information. This unique 

identifier replaces any data in the database that can be 

used to Identify an individual, such as name, address or 

social security nunrtfjer. The TTP may also collect and 

store the personal identifying information so that it can 

process identifying information that it acquires in the 

future to determine if the unique identifiers generated by 

the data provider or by the TTP refer to the same indi- 
vidual. The data provider associates its own unique 

identifier or the identifier provided by the TTP with the 

other data to create depersonalized data that may be 

sent to a data user for analysis. In this manner, different 
CM records from one or more data providers that refer to a 
^ single Individual can t3e matched by the data user, and 

CO the data provider is assured that no personal identifying - 

O information is distributed that would link an Incfividual to 

^ a particular data record. The TTP transmits information 

JjO that correlates unique identifiers from multiple data pro- 

^ viders to a data user. Each data provider transmits the 

depersonalized data, including the unique identifiers to 

_ the data user. The data user correlates the information 
Q. 
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Description 

BACKGROUND OF THE INVENTION 

5 [0001 ] The present invention concerns the depersonalization of data associated with a particular individual and, in 
particular, a method for depersonalizing data from several sources without disclosing the personalized data. 
[0002] In modern society, information relating to specific individuals rs obtained by numerous organizations. Health- 
care, financial and commercial organizations such as hospitals, laboratories, banks, insurance companies and retailers 
own data that could be used for research and development, marketing, aixJ other business functions. There is. however 

10 a growing awareness for the necessity to maintain the privacy of the individuals connecting with the data. In particular, 
information regarding an individual's health or financial status may be extremely sensitive. 

[0003] The analysis of this information often requires accessing data from multiple sources. For example, a study 
to determine the effectiveness of a particular medication may need to access records from a group of caregivers that 
prescribe tiie medication and from a corresponding group of pharmacies who prescrfoe the medication. The data 
IS owned by each of the data providers contains sensitive Information that they nnay be unable to share witii the data user 
who will be analyzing the information. While the various data providers could remove any identifying information from 
their data and provide only the medical data to the data user, the data user would not be able to correlate the data from 
the various sources and. thus, would lose information that would be needed in the analysis. 

[0004] Therefore, a need has arisen for a method for obtaining personal data from multiple sources without the abil- 
20 ity to identify the individual associated with the data buX with the ability to assodate individual data items from multiple 
sources as relating to a single individual. 

SUMMARY OF THE INVENTION 

25 [0005] The present invention relates to a computer implemented metinod and apparatus that allows an owner or 
provider of data that contains personal identifiers (data provider) to distribute that data to a data user in a d^^rsonal- 
ized form, i.e.. without revealing tiie identity of the individuals associated with the data. The data is othenwise 
unchanged. According to this metiiod. a data provider separates tiie personal information from the otiier data to create 
two data sets. Only the personal identifying information is provided to a Trusted Third Party (TTP). The TTP generates 

30 an identifier that replaces any data in the datat)ase that can be used to identify an individual, such as name, address or 
social security number. The TTP may also collect and store tiie personal identifying information so tiiat it can process 
identifying information that it acquires in tiie future to determine if the identifiers generated by the data provider or by 
the TTP refer to the same indh^idual. The data provider associates tiie identifier provided by the TTP with the other data 
to create depersonalized data tiiat may be sent to a data user for analysis. In this manner, different records from one or 

35 more data providers that refer to a single individual can be matched by the data user, and the data provider is assured 
that no personal identifying information is disti-ibuted that would link an individual to a particular data record. 

DETAIL DESCRIPTION OF THE DRAWINGS 

40 [0006] 

Figure 1 is a data flow diagram which is useful for descrit»*ng how data is transferred among tiie various parties in 
the subject invention. 

Figure 2 is a data flow diagram which illustrates one exemplary data depersonalization method. 

45 Figure 3 is a data flow diagram that illustrates a second exemplary data depersonalization method. 
Figure 4 is a data flow diagram that Illustrates a third exenplary data depersonalization metiiod. 
Figure 5 is a data flow diagram tiiat tllusb'ates a fourth exemplary data depersonalization method. 
Rgure 6 is a data f bw diagram that shows how multiple data providers may interact with a trusted third party to pro- 
vide data tiiat may be correlated by one or more data users. 

50 Figure 7 is a block diagram ttiat shows an exenplary computer configuration that may be used to implement the 
methods described in Rgures 1 through 6. 

Figure 8 is a ftow-chart diagram of an exemplary metiiod of Rgure 6. 
Figure 9 is a ftow-chart diagram of an exemplary method of Rgures 3, 4 or 5. 

55 DETAILED DESCRIPTION OF THE INVENTION 

[0007] Briefly, the present invention is a metiiod and apparatus for processing sensitive information, that identifies 
a person, so that it may be used for anonymous data analysis. In the embodiments of the invaition described below, a 
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data provider, who owns a database containing sensitive information, divides the information into two parts, identifying 
information and other information. Using the identifying information, the provider generates, or has generated for it. a 
unique identifier that is linked to the identification information in the data provider's database. The data owner then tags 
the other information with this unique identifier and provides the tagged data to the data user, in each of the embodi- 

5 ments described below, the unique identifier is generated by or registered with a Trusted Third Party (TTP) who is able 
to match the identifying information received from the data provider to other identifying information tiiat may already be 
in the TTP's database. A TTP is an entity that is under a contractual agreement to protect the identifying information 
from being disclosed, while maintaining and processing ttie data as necessary. By matching the identifying information, 
the TTP can link klentifiers tiiat are associated with data from multiple providers. These links may be provided directly 

10 to tile data users to allow the data users to correlate data from nmiltiple sources. 

[0008] In the subject application, the word "depersonalizing" is used to descn*be the process by which the identify- 
ing information is renroved from a user data record and replaced by a unique identifier. This term encompasses the 
terms "anonymizing" and ^'encoding'* as they are used in the data processing arts. When data is anonymized, or 
encoded, all identifying information is removed from a record and a truly random identifier is assigned to represent tiie 

IS person. In addition, tiie term "depersonalizing" also encompasses a process by which an identifier that is not truly ran- 
dom is replaces the personal identifying information in a data record. An Identifier of this type may be, for example, a 
hash function value or other value produced from a predetermines subset of the identifying infornrration. 
[0009] Fig. 1 shows a high-le^el data flow diagram of an exemplary information network, 1 1 0, witti which the prin- 
ciples of tiie present invention may be used. In this exemplary en^xxllment a data provider 112 owns or controls a 

20 database, 114. which, for example, is organized as a plurality of data records, each record containing one or nrx>re data 
fields. The data for each person may be kept in a single record or it may be linked across multiple records. Fields or 
portions of the fields in each record contain data that can be used to identify the individual, namely, personal identifiable 
attributes. These attributes include, for example, "name." "address" and "social security number". This is an exemplary 
and not exhaustive listing of the identifiable attributes. 

25 [0010] In addition to the identifying Information, the database contains otiier information about the individual. This 
"other information" may include, for example, medical information, f inandal data, purchase activity information or web- 
site navigation data The identifying information may also include non-identifying demographic data, for example, the 
person's occupation, their postal code or their telephone area code. Depending on the type of "other information" in tiie 
database record, some of this demographic informatfon may be classified as identifying information. For example, if the 

30 data record includes sensitive medical information tiien the entire postal code may be considered Identifying informa- 
tion while a partial postal code, for example the first tiiree digits of a five-digit zip code, would not be id^itifying infor- 
mation. 

[001 1 ] Because the type of information that may be considered to be identifying information varies witii the type of 
data stored in the datat^ase, the data provider is best able to decide which information in the person's record is consid- 

35 ered to be identifying information and which information may be passed on to a data user for analysis. The data provider 
1 1 2 creates a file 1 1 3 from tiie database, each record of tiie file contains tiie fields having the identifiable attributes from 
each record in the database. The file 1 13 is sent to a Trusted Third Party (TTP) 1 16. The TTP 116 creates a unique 
Mentif ier to be associated with the Identifying attributes. This identifier can be alphabetic, numeric, alphanumeric, sym- 
bolic and the like. If the data in the database is sensitive, ttie unique identifier may be generated In a totally random fash- 

40 ion and in a manner tiiat cannot be reversed, for example by taking the instantaneous value of the system clock register. 
If tiie data in the database is less confklential, the unique identifier may be generated from the identifying information 
by a reversible process. 

[001 2] To generate the unique identifier, the TTP 1 1 6 first compares the identifying data from a record in the f fle to 
records in an internal database 1 15 that contains identifying information which has previously been processed by the 

45 TTP. Each record of tills database also contains a source identifier tiiat identifies the data provider, who owns tiie data 
associated wrtii the identifying record, and links to otiier records in the datalxise that contain matching identifying infor- 
mation. If the TTP finds a match in its intemal database and if the source of ttie previous data is the supplier of the cur- 
rent data then the TTP 116 uses tiie previously assigned unique identifier as the identifier for the new data, ff the source 
of the previous data was not the supplier of the current data or if the TTP does not find a match for the data in its dafta- 

50 base a new unique identifier is generated for the data set Each unique identifier is specific to the data provider. 

[001 3] By assigning a different unique identifier to represent the same person for respectively different data provid- 
ers, the TTP ensures tiiat one data provider can not identify any data owned by anotiier provider. Because each data 
provider has klentifying Information for all of tiie people in its database, if the same unique identifier were used for mul- 
tiple providers, one provider could link its identifying information to depersonalized data that is owned by a different data 

55 supplier. This may result in a breach of confidentiality for that data. 

[0014] After reti-ieving or creating the unique identifier, the TTP stores it into a field of the appropriate record in the 
file 113. When all of the records have been processed, the TTP 1 16 returns the file 1 13 to the data provider 112. The 
data provider creates a new database 120 containing the records of the original database from which tiie kJentifiable 
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attributes are removed and replaced with the unique identifier. The database 120 containing the random identifiers 
along with the data not determined to be personal identifying attributes are then sent to the data user 118. The data 
user now has useful data that has been depersonalized so that the data user does not have the ability to identify an 
individual that matches a particular set of data. 

5 [001 5] For sensitive data, it is desirable for the TTP 1 16 to protect the relationship between the personal identifying 
information and the unique identifiers. For this type of information, the random identifiers provided by the TTP 1 1 6 are 
desirably totally random; there should be no way for anyone other than the data provider 11 2 or the TTP 1 1 6 to relate 
the identifier with tiie individual. Only in the drcumstance where the data provider 112 has authority to grant and grants 
specific permission should the data user be able to obtain identifying information for any data in its possession. In this 

10 exemplary embodiment, an individual may have multiple records witiiin the database owned or controlled by the data 
provider. In addition, as set forth above, tiie TTP 116 may have data on one person from multiple data providers. In 
order to link newly received personal data to data already in the database 1 1 5. ttie TTP 116 executes a matching algo- 
rithm on the data that it receives. In any scenario in which a data user requires data from multiple providers, a TTP 116 
is necessary. 

15 [0016] Many matching algorithms may k>e used in the present invention. Exemplary matching algorithnrts are dis- 
closed in a paper by M. A. Jaro entitied "Probabilistic Linkage of Large Public Healtii Data Files" Statistics in Medicine. 
vol. 14, John Wiley, pp 491-498 (1995) and in an artide by I. R Fellegi et al. entitled "A Theory of Record Linkage" Jour- 
nal of the American Statistical Association, vol. 64, No. 328, pp 1 183-1210 (1969). The simplest matching algoritfim is 
a deterministic match. By this algorithm, individual data fields from tiie newly received personal data €U'e compared to 

20 conresponding fields in the data from the database 1 15. If all of these fields match, then the newly received data is 
almost certainly for the person whose data is in the database. An exemplary set of fields that may be used for a deter- 
ministic match are Last Name, First Name, Address and Sodal Security Number. Other fields such as Telephone 
Number and Birth Date may also be used. 

[0017] Deterministic matching techniques may not identify all matches or even a large percentage of matches 
25 between two databases because of incomplete data or transcription errors. One method for enhandng deterministic 
matching technk^ues is to &nploy probabilistic techniques to determine the likelihood that two dissimilar fields match. 
Anotiier technique is to normalize the data, for example by expanding at)breviations and nicknames before performing 
the deterministic match or applying the probabilistic technkiues. Yet another method is to analyze dissimilar fields in 
othenwise matching records by their edit distances to identify possible errors in ta^nsalption. 
30 [0018] One exemplary data matching technk:|ue is presented below. This method is disdosed in copending U.S. 
patent application No. 60/165,121 filed 15 November 1999 and is one of many possible matching methods that may be 
used. The materials disdosed therein are incorporated by reference herein to the extent tiiey are material to the under- 
standing and practice of ttiis invention. The exemplary matching technk|ue comprises three steps, i) data standardiza- 
tion, ii) weight estimation, and iii) data comparison. 

35 

D^initions 

[0019] The following definitions and abbreviatiors are used for this exemplary embodiment: fi-Probability: The 
probability tiiat any random element pair will match by chance, as given by equation (1). 

40 

(1) 



45 p-Probability: The reliability of the data element. If ttie Element Error Rate ^ .99 ttien pal— EEA; Else 
p=.99— EEff 

Agreement A condition such that a given element pair matches exactiy and both elements are known 
50 ^e. ^ ^et 



Agreement Weight: The weight assigned to an element pair when tiiey agree during the record matching process 
as shown in equation (2). 

55 



AW 



= -»og2(j) (2) 
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Cartesian Product: The set of ordered pairs A*B=l(a,b)\aGA/<be B] 

Disagreement: A condition such that a given element pair does not exactly match and both elements are known 



Disagreement Weight: The weight assigned to an element pair when they disagree during the record matching 
10 process as shown in equation (3). 



DW 



= '°92(J^) (3) 



IS 

Element Error Rate: The proportion of element pairs where at least one element is unknown. e.g.. null, as shown 
in equation (4). 

20 e = (4) 

"A'B 



Frequency Table: Summary of the number of times, and percentage of total different values of a variable occur 
Mean: Arithmetk; average, as given in equation (5). 

2S 
30 

No Decision: A condition such that a given element pair where either one or both of the elements is unknown. 
Random Number Assignment: !n the exemplary embodiment of the invention, every record in the data set is 
assigned a random nundser such that v k)locks of approximately 1500 are created R = int[((7 * P)+1] where R is 
the resulting Random Number, U is the Upper Bound (defined below) and P is a random function that returns a 
35 value between 0 and 1 . In the exemplary embodiment of the inventton. P may be a pseudo random numl^er gener- 
ator. 

Threshold: The threshold utilized in probabilistic matching is a binit odds ratio with a range of -oo^ go. 

Upper Bound: Number of strata such that the data set is divided into approximately equal rows of 1500 as shown 

in equation (6). 

40 



.^r Number of Records in Data Se f^ 
«=»nt( ^ggg ) (6) 

46 

As regards the computer and machine language used in this process, just about any piece of hardware capable of exe- 
cuting a fairly large number of calculations in short order will fOI the bill. Any current state-of-the-art PC or server could 
be used. As for the operating system, UNIX is preferred, but Windows 98 or NT for Windows or the like could be used. 
so The source code can be written in any language, tough Java if preferred. 

Pata Standardisation 

[0020] The first step of this process involves the standardization of data in an input file. This standarcfization is 
ss required for inaeased precision and reliability The input file can contain any nurrt)er of variatd^ of which one or more 
are or may be unique to a particular data source such as an individual. Examples of useful variables are: merrier iden- 
tifier, drivers* license numkjer, social security number, insurance company code number, name, gender, date of birth, 
street address, city, state, postal code, dtiz&iship. In addition, some identifiers can be further distilled down into Vnmr 



5 



EP1026 603 A2 



10 



20 



25 



30 



35 



40 



SO 



65 



basic, or atomic, components. R>r example, a name m^ be broken down into atomic components of first name, last 
name and middle initial. ^ 

[0021] During the standardization process, all character data is preferably transformed to a single case, and all 
abbreviations or nick-names are transformed to their longer forms. For example all letters may be transformed to upper- 
casa So for instance, first names are standardized to uppercase, e.g., {BOB, ROB, ROBBY} » ROBERT. Conrvnon 
names for cities and streets may be transformed to the postal code. e.g.. In the U.S. to United States Postal Service 
standard. In the latter Instance this can be performed using Industry standard CASS certified software. 

Weioht Estimation 



[0022] A fundamental component of this exemplary algorithm is the process of estimating tiie agreement and dis- 
agreement weights necessary for the probabilistic function. Weights are calculated based in probabilities of chance 
agreement using an iterative bootstrap technic^e. 

[0023] The first step In tiie exemplary weight estimation process is to determine the number of strata required such 
15 that the data set can be cBvided into approximately equal blocks of 1500 rows (Fig. 2 - 201-219). see equation (6). 

s^^rNun^ber of Reco rds In Data Sef \ 
o=int(^ 1500 ) 



1500 

[0024] The source file is then scanned and the records are assigned a random number between 1 and U. A data 

matrix is created containing a Cartesian product of records with a random nunr^er of 1 assigned. The resulting matrix 
is then scanned. Each element pair within each record pair is assessed arKi assigned a value as shown in equation (7). 



Cm Cm 

-lifA^^B PisagrettiiOTt) 

^Cm Cm 



Oif^ -NullznAlorQ = Mi// (No decision) 

Cm Cm (7) 



where 



is tiie nth element from record A 

[0025] Once the matrix has been fully assessed, percentages for 



each are tabulated and stored. This process may be repeated for a number (e.g. 15) of iterations. 
45 [0026] Mean percentages of Agreements and No Decisions are calculated for each data element The p probabilfty. 
or the reliability, for each data element is then calculated, see equation (8). 



^^ P t iB cii t No Po cw mp 



ife^ S9thcn\'e (*) 
clsc99'e 



[0027] The \i probability, or the probability that element n for any given record pair will match by chance, is calcu- 
lated see equation (9). 
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M = X Percent Agreement 
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[0028] From the p and \x probabilities, the disagreement and agreement weight formula may calculated employing 
equations (10) and (1 1) respectively. 



Unique Identif ier Assignment 

[0029] The final stage of this process Is the action of uniquely identifying entities within the input data set 
[0030] Each record from the Input file Is equated against the reference database 11 5 to determine if the entity rep- 
resented by tile data has been previously identified using a combination of deterministic and probabilistic matching 
techniques. If it is judged that the entity is already represented in tiie reference set. the input record is assigned the 
unique identifier (DID) from the reference record that it has matched against If it is judged that tiie entity represented 
by data is not yet in the reference set a new DID is randomly generated and assigned. Random values may be gener- 
ated using many different algorittims. As set forth above, if the data is sensitive, it is desirable that the random identifier 
be truly random, generated, for example, using tiie Instantaneous value of the system dock register. For less sensitive 
data reversibte metiiods may be used. It is desirable, however, for the identifier to be unique; only one person should be 
associated wrtti any one Identifier. This random identifier may be numeric alphanumeric, or symtx>llc (e.g. a spatial pat- 
tem or hologram). 

[0031 ] After the UID assignment occurs, the input record is evaluated, in its entirety, to determine If the record is a 
unique representation of tiie entity not already contained in the reference table. If It is a new record, then It Is Inserted 
into the reference database 1 15 for fixture use. 

Pet^nr^ni^ic Matching T^nigMQ 

[0032] The exemplary deterministic matching technique employs simple Boolean logic and is applied after tiie data 
has been standardized. Two records are judged to match if certain criteria are met such as the following: 

First Name Matches Exactly 
Last Name Matches Exactiy 
Date of BIrtii Matches Exactiy 

Social Security Number OR Member Identifier Matehes Exactiy 

[0033] If two records satisfy the criteria fbr deterministic matching, no probabilistic processing occurs. However, if 
no deterministic match occurs, the input record is presented for a prob^ilistic match. 

Probabilistic Matching Technique 

[0034] The first step In the probabilistic matching process Is to build a set of candidate records from the reference 
table based on characteristics of specific elements of the input record. This process Is referred to as blocking, the set 
of candidate records is referred to as tiie bkKking table. All data sets do not use tiie same characteristics, the elements 
used in tiiis process are determined tiirough data analysis. It is suggested, however, that the blocking variables include 
those elements that are somewhat unique to an individual, e.g.. social security number, or a combination of date of birth 
and last name. Upon completion of the constiuction of the blocking tattle, each element fbr each candidate record is 
compared against its corresponding element from the input record. See equation (12) for the scoring mechanism. 




(10) 
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5 



Agreement Weight if =Sg 
Otf Ae Null and/or =Null 
Disagreement Weight if A ^ 



(12) 



10 where 



IS IS the nth element from record A 

[0035] A composite weight is then calculated for all candidate records, see equation (13). 



N 



(13) 



20 



[0036] The candidate record with the highest composite weight is then evaluated against a predefined threshold, if 
the weight meets or exceeds the threshold, the candidate record is judged to match the input record. If the weight does 

25 not exceed the threshold, it is assumed that the input record represents an entity not yet included in the reference set. 
[0037] The exemplary matching technique does not attempt to determine whether two fields that disagree repre- 
sent the same data. If, for example, because of a transcription error, a social secin'ity number of 123 45 6789 were 
recorded as 123 45 6798, the algorithm set forth above would indicate disagreement One alternative enhancement to 
the algorithm set fbrtii above may be to employ some measure of similarity such as Edit Distance between similar fields. 

30 For example, the social security numbers described above have an edit distance of one because a digit substitution of 
the last two digits would produce the correct result This measure of similarity may be employed, for exanrple. as a part 
of the probabilistic process or as a post processing step to confirm that the result of the probabilistic process is correct 
[0038] Rgures 2. 3. 4 and 5 show altemative embodiments for employing a TTP 1 16 in the anonymous transfer of 
sensitive information from a data provider 1 1 2 to a data user 118. Although each of the embodiments includes a single 

35 date provider, it is contemplated that except for Rgure 2. all embodiments may be ^cpanded to include multiple inde- 
pendent information providers. The embodiment shown in Figure 2 may include multiple information sources from a sin- 
gle information provider. One implementation that illustrates multiple information providers is described below with 
reference to Rgure 6. 

[0039] In the embodiment shown in Rgure 2. a data supplier 1 1 2 processes input information in the database 111 
40 to separate tiie personal data 1 13 from tiie otiier data in the database. The personal data is sent to tiie TTP 1 16 for 
processing, as described above. The TTP 116 returns the personal data with each record now including a unique iden- 
tifier. The data supplier 112 then matches the unique identifier to the data In tiie input database 111 and separates the 
other information arxi the associated unique identifiers into a depersonalized database 1 20. This depersonalized data- 
base is then sent to tiie data user 1 1 8 for analysis. 
45 [0040] In the exemplary embodiment shown in Rgure 2, tiiere is no direct communication between the TTP 1 1 6 aiKl 
the data user 118. This embodiment may be used where a single data provider includes multiple data sources and 
needs to match tiie data from the various data sources. One example of this is a hospital environment in which billing 
records, patient treatment records, pheumacy records, radiology records and tiierapy records may be kept separately, 
pertiaps by separate contractors. The hospital may want to match these records i ntemally for its own use arxl may want 
so to provide the data to an external data user. In tiiis emtxxliment, the TTP 116 matches the records from the various data 
sources and provides a single unique identifier for each person among all of the sources. 

[0041] The exemplary embodiment shown in Rgure 3 differs from that shown in Figure 2 in that tiie TTP 116 does 
not communicate the unique identifier to the data provider. In this emtxxjiment. the provider 112 processes its input 
datat>ase to generate two datat)ases. One database. 113 has only identifying information and the other datat>ase has 
55 only tiie otiier information. The data provider assigns common identifiers to corresponding records in the two data- 
k3ases. These identifiers may be as simple as a record number or as complex as a random identifier for a particular indi- 
vidual. In the first instance, the data provider makes no attempt to link multiple records for tiie same person. In the 
secorxJ instance, the data provkier has already linked the records and has placed the unkjue Mentifier for the person 
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into both the records of the database 1 13 arKl the corresponding records of the database 120. Where the data provider 
has assigned unique identifiers, the identifiers may be random, pseudo random or reversible. It is noted, however, that 
reversible unique identifiers may only be used in situations where at least some personal information may be disclosed. 
[0042] The database 1 1 3 is provided to the TTP 116 where it is processed, as described above, to match records 
5 having the same identifying information to each other and to records in tiie internal dataksase (not shown) of the TIP 
116. 

[0043] At the same time that the identifying data is sent to the TTP, the database 120 containing the other data is 
sent to the data user 118. After receiving the database 1 20, the data user waits to receive correlating data 31 0 from tiie 
TTP 116. This conrelating data matches the record identifiers or unique identifiers from the data provider to unique iden- 

10 tifiers generated by the TTP. The data user adds the unique identifiers generated by the TTP 1 16 to the appropriate 
records of the datak>ase 120 and processes the other information using the TTP unique identifiers. 
[0044] When the system shown in Figure 3 is used with multiple data providers, the correlating data 310 provided 
by tiie TTP 116 may also include a table indicating con^espondence among the unique Identifiers or record numbers 
provided by the multiple data providers. Using this information, the data user 1 18 may associate date from tiie multiple 

IS providers before performing the data analysis. The system shown in Rgure 4 is similar to that described above with ref- 
erence to Figure 2 except that, in the system of Rgure 4, tinere is communication between the TTP 116 and the data 
user 118. tn Rgure 4, the data supplier sends the identifying information to the TTP 116 who matches the data, adds 
unique identifiers arxl sends the identifying information with the unique identifiers back to the data supplier 1 12. The 
data supplier then copies the unique identifiers from the identifying information records to the associated other informa- 

20 tion records and provides the other information records to the data user 1 1 8. The data user 118 then receives correlat- 
ing data (410) direcUy from tiie TTP 116. In tiiis instance, the conrelating information includes unique Identifiers from 
other data suppliers tiiat correspond to tiie unique identifiers in the depersonalized data 120 that is provided by the data 
supplier 1 12. 

[0045] In tiie system shown in Rgure 4, this correlating data 410 may be provided by the TTP 1 16 to tiie data user 
25 1 1 8 at the request of the data provider 1 1 2 or it may be requested by tiie data user 118. When the data is requested by 
the data provider, the TTP provides conrelating information for ail of tiie data suppliers in its database. When the data 
user asks lor data, however, it requests information from only those data providers from wfiich it receives data. 
[0046] Figure 5 shows a system that is similar to the system shown in Figure 3 except that, rather than send all cor- 
relating data to tiie data user, tiie TTP 1 16 sends conrelating data to the data user 1 18 only in response to a specific 
30 request. As with the system shown in Figure 4, that request may be for only those data providers who supply data to 
the data user 118. 

[0047] In any of the systems shown in Figures 1 through 5. it may be necessary for the data user to identify the per- 
son whose data is being evaluated. If. for example, tiie data user 1 18 is processing medical data and identifies a life- 
threatening condition, the data user may need to notify the individual. In this instance, tiie data user may ask the data 
35 supplier for the identifying information. In situations where the unique identifiers being used by the data user do not 
match the identifiers held by the data provider, the data provider 1 12 may then authorize the TTP 1 16 to divulge tiie 
information to the data user 118. 

[0048] Rgure 6 illustrates another exemplary embodiment using the princq^les of the present invention. In tiiis 
embodiment. The Trusted Third Party 116 provides each data provider 1 1 2a. 1 1 ^ and 1 1 2c with software and/or hard- 

40 ware that performs the depersonalizing process and a supporting datat>ase 1 1 5a, 1 1 5b and 1 1 5c that holds the identi- 
fied depersonalized data. Each database 1 15a, 1 15b and 1 15c contains incfividual identifiable atti-ibutes and individual 
identifiers for tiie respective data provider 11 2a, 1 12b and 1 12c obtained from a central database 1 15 owned or con- 
broiled by the TTP 116. The central database 115 is populated vinth information obtained from authorized sources of 
such information during past processing. For each record the data provider wishes to sipply to a data user 1 18, the 

45 data provider extracts the identifying fields fbr tiie record and inputs them into the depersonalizing process. The deper- 
sonalizing process assigns tiie random identifier t>y matehing tiie information held by the data user with information pre- 
viously stored in the datat>ase provided by tiie Trusted Third Party. If no matching data is found in the respective 
datak^ase 1 15a, 1 15b and 1 15c a unique and possibly random identifier is assigned and provided as output from the 
process. If a match with previously depersonalized data is encountered, tiie unique identifier assigned initially is pro- 

50 vided as output from the process. The data providers 1 1 2a, 1 1 2b and 1 1 2c substitute the unique identifiers for the Indi- 
vidual identifiable attiibutes in the record to create respective depersonalized records. The data suppliers then send the 
depersonalized records to ttie data user 118. 

[0049] In order to enable the linking of multiple sources of depersonalized data, each data provider 1 1 2a. 1 1 2b and 
1 12c supplies, to the TTP 1 16. a file containing the identifying data and the unique identifiers assigned by the data pro- 
55 \n6efs depersonalizing process 1 16a. 1 16b and 1 16c. The TTP correlates these files to identify matches among the 
identifying Information records provided by the respective data providers and stores the unique identifiers, with indica- 
tions of any correlation, witfiin the central database. When authorized by the data provider, the TTP may supply infor- 
mation to the data user showing the random identifiers from any of the data provider that relates to tiie same tndividuat. 
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thus allowing the data user to create a linked depersonalized database 120. 

[0050] In some instances, a data provider 11 2a will not supply the identifying data to the TTP 1 16. In thisSnstance. 
the TTP 1 16 will maintain a central database that is pre-populated with data from public sources, such as telephone 
directories, and will supply the matching algorithms to the data provider. The TTP 116 will receive only those files from 
5 a data supplier that have been previously matched with the TTP 116 database. It is apparent that correlation of data 
within certain groups of individuals who do not exist in the public databases, such as children, may be excluded from 
the data user. However, the process favors false negative correlation over false positive. 

[0051] A practitioner skilled in the art would recognize the many permutations of the basic concept of the present 
invention, that is. the use of a trusted third party with a data provider and a data user to depersonalize data as the data 
10 passes from provider to user. The embodiments described above are exemplary in nature, and do not constitute an 
exhaustive listing of the various ways this invention may be implemented. 

[0052] Rgure 7 is a block diagram of an exennplary physical Inplementation of any of the information networks 
shown in Rgures 1 through 6. The exemplary system is linked by a local area or wide area network 71 6 which may also 
be connected to a global information network, such as the Internet, by a direct communications interface 718 and by 

15 removable media 722. The exemplary system shown in Figure 7 includes six processing systems, 710, 730, 740, 760, 
770 and 780. Each of tiiese systems may include any of the communication interlaces shown for processing system 
710. Each of the systems 710, 730, 740, 760. 770 and 780 has an assodated database 712, 732. 742. 762. 772 and 
782. The databases maintained by the data provider, data user and TTP may reside on any commerdaily available host 
computer, as currentiy known in tiie art. 

20 [0053] The exemplary processing system 710 includes a host computer 71 4 and a network interface 71 6 by which 
the host computer 71 4 may communicate with otiier data processing systems via a local area network, a wide area net- 
work or a global information network As shown in Figure 7, the host computer 714 communicates with the processing 
systems 740 arxJ 730 via a local area network (LAN) 717. Computer 71 4 also uses the LAN 71 7 to communicate witii 
a global information network server 750 and, through the server 750 and global information network 752, to remote 

25 users 760 and 780. In addition to the network interface, the host computer 714 of the data processing system 710 
includes a communk»tions interface 718. fbr example, a modem, through which tiie processing system 710 may com- 
municate with tiie remote user 770. The processing system 710 also includes an input/oufput (I/O) processor 720 which 
Is coupled to a removable media device 722. for example a diskette drive, through which the host computer can com- 
municate with any otiier computer system that does not have a direct or indirect data communication path with tiie host 

30 computer 714. 

[0054] Each host computer may contain one or more processors (not shown), memory (not shown), input and out- 
put devices (not shown), and access to mass storage (not shown). Each processing system may be a single system or 
a network of computers, as currentiy known in the art The data providers, TTP and data users may exchange data over 
computer network such as LAN 71 7 or by physically transferring data on removable mecfia 722 from location to location. 
35 The system may also be implemented across a global information network such as tiie Internet The host computer and 
the global information network may also communicate witii a plurality of remote users. 

[0055] The term "datak>ase" may be broadly interpreted to mean any database using records emd fields, or their 
equivalent The metiiod is not limited by the high-level language used to code the data or tiie language used to code 
the programs which inrplement the required data processing. It Is contemplated that the subject Invention may be prac- 
40 ticed in computer software executed by tiie data provider(s) 112, trusted third party 116 and data user 118. This com- 
puter software may be implemented on a carrier, such as a diskette, CD-ROM. DVD-ROM or radio frequency or audio 
frequency earner wave. 

[0056] Rgures 8 and 9 are flow-chart diagrams which illustrate exemplary embodiments of the invention. Rgure 8 
illustrates a process such as that shown in Rgure 6 and Rgure 9 shows a process such as that shown in Rgures 3, 4 
45 or 5. 

[0057] In Rgure 8, at step 81 0, the TTP 116 provides the encoding process and encoding database to two retailers, 
retailer 1 1 2a and retailer 1 1 2b. The retailers implement the process and database within their company The datai^ses 
1 1 5a and 1 1 5b provkled by the TTP 1 16 in this exemplary emtxxiiment of the invention are pre-populated with informa- 
tion supplied from the TTP central database 1 15. The information provided does not include any unique identifiers. 

so At step 812. each of the retailers 1 12a and 1 12b extracts the individual demographic attaibutes and Individual identifiers 
from each data record it wishes to sent to the data user 1 1 8. in this example, a marketing agency. For each record, the 
information is processed through TTP's supplied encoding process. The encoding process, at step 814 assigns a 
unique identifier to each record. Next, at step 814. tiie retailers 112a and 112b create the depersonalized data by 
repladng the individual demographic attributes arKi individual identifiers witii tiie single unique identifier provided by the 

55 encoding process and send tiie depersonalized data to the marketing agency 118. 

[0058] Next, at step 818, the retailers 1 12a and 1 1 2b send, to the TTP 1 1 6, tiie unk)ue identifiers assigned fbr each 
record where they encountered a match during the encoding process execution. TTie TTP 1 1 6. at step 820 stores tiie 
unique klentifier assignment information provided by the retailers 1 12a and 1 12b in Hs central database 115. Also at 
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step 820, the TTP 116 sends the unique identifiers for the retailers 1 1 2a and 1 1 2b^ which link to the same individual, as 
the correlating information to the marketing agency 118. 

[0059] At step 822, the marketing agency linl<s the data using the correlating information and performs Its marketing 
study. This study is performed without the ability to identify any individual person. As illustrated by the arrow from block 

5 822 to t>lock 812, the process is iterativa Periodically, the TTP 116 sends updates to the encoding process and data- 
base to the retailers 112a and 112b. TTiese updates result from updates / additions to the encoding process central 
database obtained by TTP 116. After processing these updates, th e retailers 1 1 2a and 1 1 2b send back to the TTP 116 
ail unique identifiers that were previously assigned by the retailers to the newly supplied information. 
[0060] It is noted that in this embodiment of the invention, the retailers 1 12a and 1 12b never provided any Identifi- 

10 able retail information. The retail data provided by the retailers to the marketing agency had no individual identifiable 
attributes. Thus, the marketing agency 118 never knew the identity of the actual individuals. Nonetheless, the marketing 
agency 118 was able to use the power of the retailer's information to enhance marketing study capability. 
[0061] In the exemplary embodiment of the invention shown in Figure 9. a manufacturer 118 wishes to use the 
healthcare information of three local healthcare providers to identify the health habits of a specific disease state. Three 

75 data provklers 112, Provider A, ProviderB and PrcviderC have information which identifies the individual (for example: 
Member number, social security number, name. etc.). The manufiacturer 1 1 8, Provider A, ProviderB and PioviderC con- 
tractually authorize a Trusted Third Party (TTP) 1 16 to encode the healthcare data using the healthcare data encoding 
process shown in Rgure 9. 

[0062] At step 910 of this process. ProviderA, ProviderB and ProviderC each extracts the individual identifiable 
20 information from their internal databases 111 of healthcare records into a file 113. At step 912, ProviderA. ProviderB 
and ProviderC send the files to TTP 116. 

[0063] At step 91 4, the TTP 116 identifies each indvkJual using ifs matching process and assigns an Encoding Key 
to each record. At step 91 6. the TTP 116 sends tiie files witii the corresponding Encoding Keys t>ack to ProviderA. Pro- 
viders and ProviderC. Next, at step 916. ProviderA. ProviderB and ProviderC replace the individual attributes for each 

25 record they wish to send to the manufacturer 118 with the encoding key received from tiie TTP 116. Also at step 918. 
ProviderA. ProviderB and ProviderC send the encoded healthcare information files to the manufacturer 1 18. At step 
920. the manufacturer receives tiie encoded healthcare information files and obtains the correlating data from the TTP 
1 16. Rnally. at step 922. the manufacturer 1 18 linte tiie data from ProviderA, ProvMerB and ProviderC and completes 
its study. It is noted that this study is connpleted without the manufacturer being able to identify any person. 

30 [0064] While tiie invention has been described in terms of a number of exemplary emtxxjiments, it is contemplated 
tiiat it mayt>e practiced as descrik>ed above with variations that are witiiln tiie scope of the appended claims. 

Claims 

35 1. A method of distributing data records, which include identifying information fields and other data fields, in an infor- 
mation network comprising a data provider, a data user and a ti^usted tiiird party, wherein the identifying information 
in each record klentifies a person, sakJ method comprising the steps of: 

a) separating the identifying information f iekJs from the o\h& data fields for each data record to generate iden- 
40 tifying records: 

b) transferring a copy of the identifying records to the trusted third party; 

c) associating, by the trusted third party, each of tiie identifying records with a unique identifier, wherein a 
respectively different unique identifier is assigned to each person identified by one or more of tiie kientifying 
records; 

46 d) transferring, by the trusted third party, the unk:|ue identifiers to the data provkler; 

e) associating, by tiie data provider, the otiier data fields with the respective unk:|ue klentifiers to forni deper- 
sonalized data; and 

f) transferring, by each of tiie data providers, the depersonalized data to tiie data user. 

50 2. A method according to claim 1 wherein the step of assodating tiie identifying records by the trusted tiiird party 
includes the step of generating a random identifier that cannot be used to recover any of the kientifying information 
fields as tiie unique identifier. 

3. A metiiod of distributing data records, which include identifying information fields and other data fields, in an infor- 
55 mation network comprising a plurality of data providers, a data user and a trusted third party, wherein tiie kientifying 
information in each data record identifies a person, said method comprising the steps of: 

a) separating, by each of tiie data provklers. ttie identifying information f iekis from the ottier data f iekis for each 
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data record to generate identifying records; 

b) transfeixing, by each of the data providers, a copy of the identifying records to the trusted third party; 

c) associating, by the trusted third party, each of the id&itifying records, with a unique identifier, wherein a 
respectively different unique identifier is assigned to each individual person identified by one or more of the 

5 identifying records; 

d) transferring, by the trusted third party, the unique identifiers to the respective data providers from which the 
identifying records used to generate the unique identifiers were received; 

e) associating, by each of the data providers, the other data fields with the respective unique identifiers to form 
depersonalized data; and 

10 f) transferring, by each of the data providers, the depersonalized data to the data user. 

4. A method according to any one of claims 1 to 3 wherein the step of associating, by the trusted third party, each of 
the identifying records, with a unique identifier, includes the step of generating a random Identifier that cannot be 
used to recover any of the identifying Information fields as the unique identifier, wherein when the identifying infor- 

IS mation fields provided by nfx>re than one of the plurality of data providers corresponds to one person, respectively 
different unique identifiers are generated for each of the more than one information providers. 

5. A method according to any one of claims 1 to 4 wherein the step of associating, by the trusted third party each of 
the identifying records, with a unique identifier further includes the steps of: 

20 

a) recording, by tiie trusted third party, a correlation of each person for whom rrojttqple unique identifiers eu'e 
assigned to form correlating information ; and 

b) transfen'Ing. by the trusted tiiird party, tiie correlating information to the data user. 

25 6. A method according to claim 5 wherein the step of transferring, by the tiusted tiiird party, tiie con^elating information 
to the data user, includes the steps of 

a) receiving, from tiie data user, a request for correlating infbnmation lor specific ones of the plurality of data 
providers; and 

30 b) transferring tiie correlating Information for only the specific ones of the plurality of data providers. 

7. A method of distributing a plurality of data records, which include identifying information fields and other data fields, 
in an information network comprising a plurality of data providers, a data user and a trusted tiiird party, wherein the 
identifying information in each data record identifies a person, said metiiod comprising ttie steps of: 

35 

a) generating, by each of the data providers, a plurality of first unique identifiers from the identifying information 
fields of the plurality of data records; 

b) transferring, by each of tiie data providers, a copy of the identifying information fields from each of the plu- 
rality data records and a respective copy of each of the plurality of unique identifiers, as a respective plurality 

40 of identifying records, to tiie trusted tiiird party; 

c) transferring, by each of the data providers, a copy of the otiier data fields from each of the plurality data 
records and a respective copy of each of the plurality of first unique identifiers, as a respective plurality of data 
records, to tiie data user; 

d) associating, by the trusted third party, each of tiie identifying records, witii a second unique identifier, 
45 wherein a respectively different second unique identifier is assigned to each individual person identified by one 

or more of the identifying records; 

e) transferring, by tiie trusted third party, the first unique identifiers and the second unique identifiers to tiie data 
user; and 

f) associating, by tiie data user, the otiier data records provided by tiie data provider witii the unique identifiers 
so provided by ttie trusted third party. 

8. A method of processing and distributing a plurality of data records, wherein each of the plurality of data records 
contains information used to identify a person, by a trusted third party, said metiiod comprising the steps of: 

ss a) receivingTfrom a plurality of data providers, a copy of tiie plurality of identifying records; 

b) associating each of the identifying records, with a unique identifier, wherein a respectively different unique 
identifier is assigned to each individual person identified by one or more of the identifying records; 

c) matching records associated witii a particular person among ttie identifying records provided by tiie plurality 
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of data providers, to generate the second unique identifier which is the same f6r all identifying records provided 
by the plurality of data providers; and 

d) transferring the unique identifiers to the respective data providers from which the identifying records used to 
generate the unique identifiers were received. 

5 

9. A carrier containing a set of instructions for causing a general purpose computer network comprising a data pro- 
vider, a data user and a trusted third party, said network accessing a plurality of data r^pords which include identi- 
fying information fields and other data fields, wherein the identifying information in each record identifies a person, 
to perform the following steps: 

10 

a) separating the identifying information fields from the other data fields for each data record to generate iden- 
tifying records; 

b) transfen-ing a copy of the identifying records to the trusted third party; 

c) assodating. by the trusted third party, each of the identifying records with a unique identifier, wherein a 
IS respectively different uroque identifier is assigned to each person identified by one or more of the Identifying 

records; 

d) transferring, by the trusted third party, the unique identifiers to the data provider; 

e) associating, by the data provider, the otiier data fields with the respective unique identifiers to form deper- 
sonalized data; and 

20 Q transferring, by each of the data providers, the depersonalized data to the data user. 

10. A carrier according to daim 9 wherein the step of associating the identifying records by the trusted third part 
includes the step of generating a random identifier that cannot be used to recover any of the identifying information 
fields as the unique identifier. 

25 

11. A carrier containing a set of instructions for causing a network of general purpose connputers comprising a plurality 
of data providers, a data user and a trusted third party, said network accessing a plurality of data records which 
include identifying information and other fields, wherein the identifying information in each data record identifies a 
person, to perform the following steps: 

30 

a) separating, by each of tiie data providers, the identifying information fields from the otiier data fieMs for each 
data record to g^erate identifying records; 

b) transfening. by each of the data providers, a copy of the identifying records to the trusted third party; 

c) associating, by the trusted third party, each of tiie identifying records, with a unique identifier, wherein a 
35 respectively different unique identifier is assigned to each individual person identified by one or more of the 

identifying records; 

d) transferring, by the trusted third party, the unique identifiers to the respective data providers from which the 
identifying records used to generate the unique identifiers were received; 

e) associating, t>y each of the data providers, the other data fields with the respective unique identifiers to form 
40 depersonalized data; and 

f) transferring, by each of the data providers, the depersonalized data to the data user. 

12. A carrier according to claim 1 1 wherein tiie step of associating, by tiie tmsted third party, each of tiie kientifying 
records, witii a unique identifier, includes tiie step of generating a random kjentif ier ttiat cannot be used to recover 

45 any of the identifying information fields as the unique klentif ier. wherein when the identifying information fields pro- 
vided by more than one of the plurality of data providers corresponds to one person, respectively different unique 
klentifiers are generated for each of tiie more than one information provMers. 

13. A carrier containing a set of instructions for causing a network of general purpose computers, said network corn- 
so prising a plurality of data provklers. a data user and a trusted third party, said network accessing a plurality of data 

records whtoh include identifying information fields and other data fields, wherein the identifying information in each 
data record Klentif ies a person, to perform a method comprising the steps of: 

a) generating, by each of the data providers, a plurality of first unique identifiers from the klentifying information 
ss fields of the plurality of data records; 

b) transferring, by each of the data providers, a copy of the identifying information fiekte from each of the plu- 
rality data records and a respective copy of each of the plurality of unk:|ue kientif iers, as a respective plurality 
of identifying records, to the trusted third party; 
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c) transferring, by each of the data providers, a copy of the other data fields from each of the plurality data 
records and a respective copy of each of the plurality of first unique identifiers, as a respective plurality of data 
records, to the data user; 

d) associating, by the trusted third party, each of the identifying records, with a second unique identifier, 
wherein a respectively different second unique identifier is assign^ to each individual person identified by one 
or more of the identifying records; 

e) transfemng, by the trusted third party, the first unique identifiers and the second unique identifiers to the data 
user; and 

Q associating, by the data user, the other data records provided by the data provider witii the unique identifiers 
provided by tiie trusted third party. 

14. The carrier of claim 13 further comprising instructions to perform the steps of matching records associated with a 
particular person among tiie identifying records provided by the plurality of data providers, to generate tiie second 
unique identifier which is tiie same for all identifying records provided by the plurality of data providers, wherein the 
matching is performed by the trusted third party. 

15. A carrier containing a set of instructions for causing a general purpose computer accessing a plurality of data 
records, wherein each of the plurality of data records contains information used to identify a person, by a trusted 
third party, to perform the steps of: 

a) receiving a pliD-ality of identifying records from a first data provider; 

b) assodating each of tiie plurality of Identifying records wHh a unique identifier, wherein a respectively cEfferent 
unique identifier is assigned to each person identified by one or more of the plurality of identifying records; and 

c) transf^nring tiie unique Identifiers to tiie data provider. 

16. A carrier according to claim 15 wherein the step of associating the identifying records includes the step of generat- 
ing a random identifier that cannot be used to recover any of a plurality of identifying information fields as the unique 
Identifier. 

17. A carrier containing a set of instruction for causing a general purpose conrputer accessing a plurality of data 
records wherein each of tiie plurality of data records contains information used to identify a person by a tiusted third 
party, to perform the steps of: 

a) receiving, from a plurality of data providers, a copy of the plurality of identifying records; 

b) associating each of the identifying records, witii a unique identifier, wherein a respectively different unique 
identifier is assigned to each individual person identified by one or more of the identifying records; 

c) matching records associated with a particular person among the Identifying records provided by the plurality 
of data providers, to generate the second unique identifier which is the same for all iderrtifying records provided 
by the plurality of data providers; and 

d) transferring the unique identifiers to tiie respective data providers from which tiie identifying records used to 
generate the unique identifiers were received. 

18. A carrier according to claim 17 wherein tiie step of associating, by the trusted third party, each of the identifying 
records, with a unique identifier, includes the step of generating a random identifier that cannot be used to recover 
any of the identifying information fields as the unique identifier, wherein when the identifying information fields pro- 
vided by more than one of tiie plurality of data providers con-esponds to one person, respectively different unique 
iderrtif iers are generated for each of the more than one information providers. 
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DATA FROM PROVIDERS AND 
ASSIGNS UNIQUE IDENTIFIERS 



TTF SENDS DATA WITH UNIQUE 
IDENTIFIERS BACK TO PROVIDERS 



PROVIDERS PREPARE DATA 
TO SEND TO DATA USERS; 
REPLACE IDENTIFYING DATA UTITH 
UNIQUE IDEKTIFIER 



DATA USERS OBTAIN 
CORRELATING DATA FROM 



DATA USERS LINK DATA 
FROM PROVIDERS AND 
PERFORM ANALYSIS 
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